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Optimizing the stable behavior of parameter-dependent 
dynamical systems — maximal domains of attraction, minimal 

absorption times 
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Abstract 



^ We propose a method for approximating solutions to optimization problems involving the global 

stability properties of parameter-dependent continuous-time autonomous dynamical systems. The 
method relies on an approximation of the infinite-state deterministic system by a finite-state non- 
deterministic one — a Markov jump process. The key properties of the method are that it does not 
use any trajectory simulation, and that the parameters and objective function are in a simple (and 
except for a system of linear equations) explicit relationship. 

o 

ri 1 Introduction 

^ The problem. In numerous applications one is faced with the problem of steering a continuous time 

G dynamical system which depends on some fixed (control) parameters into a target region. The word fixed 

refers to the case where these parameters can be set at the beginning, but they cannot be changed once 
psj the system is "running" . 

Not only the asymptotic behavior (i.e. entering the target region) is of interest. Dealing with the 
nonlinear behavior of the system outside the target region is also important, it is desirable to achieve 
some kind of optimal stability properties; e.g. the domain of attraction^ (DOA) of the target region 
should be maximal, or the average time the system needs to reach the region should be minimal. 
• The literature extensively studies domains of attraction and their numerical computation (cf . below) , 

I there are even approaches aiming at the local enlargement of these objects [DK71, SS75, LL06]. It seems 

however, that little to no effort has been done in considering this task as an optimization problem, stated 
T-H above. The current work proposes a method tackling this particular problem, and acting thereby globally. 



Possible tools and previous work. Numerical methods computing the DOA of a particular system 
can be roughly divided into three major groups. 

• The first are methods using direct simulation (forward or backward) of the underlying ordinary 
differential equation [GTV85, HsuSO, IIG80, FG88, GriiOl], in order to obtain a set covering the 
DOA, or some approximation on its boundary. 

• The second are Lyapunov function based techniques, e.g. [LL61, Gie09] or Zubov's method [Zub64] 
(see also in [HahG?]). Usually they allow one to extract a subset of the DOA as a level set of a 
function solving some partial differential equation, and no trajectories have to be simulated. 
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• The third group consists of probabilistic approaches, with trajectory simulation [GNW04], or with- 
out [Kol]. Absorption probabilities for an approximate non-deterministic system reveal properties 
of the DOA. Probabilistic approaches for solving stochastic control problems have been studied by 
Kushner et al. [Kus77, KD92]. 

In order to view the stability behavior of the system as an optimization problem, we choose a simple 
setting: let a real valued objective function be given, which depends on the stability properties of the 
system in a desired way; such a function could be e.g. the Lebesgue volume of the DOA. To make the 
optimization more convenient to carry out, we assume the dependence of this objective function on the 
control parameter to be sufficiently smooth (at least differentiable) , that we can use a simple gradient 
method to get to a (local) maximum/minimum. 

Along the lines of the above example (maximization of the volume of the DOA) we would like to 
highlight some difficulties arising if we try to use one of the above methods for optimization by the 
gradient method. 

For methods using direct simulation, computing the gradient of the objective function will require by 
the chain rule the computation of derivatives of the flow with respect to (w.r.t.) the parameters. Thus, 
variational equations for the underlying ordinary differential equation (ODE) have to be solved. These 
are computationally expensive, since they are in general higher dimensional than the original ODE^, and 
they also require the derivative of the vector field (which also has to be computed numerically) . 
The Lyapunov function based techniques deliver only the boundary of the DOA as level set of some 
scalar function. It is not clear how to obtain the derivative of the level set w.r.t. the parameters, and if 
there is a way to devise an efficient method based on this. Further, the most Lyapunov function based 
techniques deliver merely a subset of the DOA which' topology depends on the characteristics of the 
chosen Lyapunov function class (e.g., the level sets of quadratic functions are always ellipsoids). 

The considerations above should give the reader a first insight into what one has to deal with when 
trying to solve an optimization problem associated with the DOA. Nevertheless, the main purpose of 
this work is not the comparison of the different approaches one could use for optimizing the objective 
function, rather to analyze and show the feasibility of the method we chose. 

The approach. Our method is based on a set-oriented approach to approximate properties of dynam- 
ical systems [DH97, DJ98]. We partition the state space X into disjoint sets, which serve to discretize 
the original deterministic dynamical system, and relate it to a finite state non-deterministic system — a 
Markov jump process. Then we construct the infinitesimal generator (matrix) of the latter by integrating 
the vector field of the ODE on the boundary of the partition elements. From this matrix we compute 
absorption probabilities for the finite state system, which yield the desired information about the DOA; 
or we compute expected absorption times, which serve as approximations of the absorption times of 
the original system. For example, the volume of the DOA is simply the integral over the absorption 
probability function associated with the original deterministic system.^ 

All of the involved computations are explicit (except for the solution of a system of linear equations) , 
and it is easy to obtain their derivatives. Composing these derivatives according to the chain rule yields the 
derivative of the objective function w.r.t. the parameters, and allows the application of local optimization 
methods, e.g. the gradient method. 

Outline. This work is structured as follows. Section 2 collects the mathematical background on deter- 
ministic and non-deterministic dynamical systems which are going to be used afterwards. In Section 3 
we introduce the discretization we use to make a finite-state non-deterministic system from the original 
infinite-state deterministic one. Section 4 is an outlook: it collects possible alternatives for the approxima- 
tion of the absorption times of the deterministic system by quantities derived from the non-deterministic 
one. Section 5 is the main part of this work stating the optimization problem and showing how we 

^They are, in fact, an ODE for the so-called sensitivity matrix w.r.t. the parameters. The dimension of the sensitivity 
matrix is the state space dimension times the parameter space dimension. 
^Since this is 1 if a given point is contained in the DOA, and otherwise. 
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propose to solve it. Numerical examples follow in Section 6, and some concluding remarks are collected 
in Section 7. 

2 Background 

2.1 Dynamical systems 

The temporal variation of states x{t) e M'* we are interested in is given by an (autonomous) ODE 
X = v{x), where w : M'' — )■ M'*, and i '■= ^ denotes the temporal derivative. Let 0* denote the associated 
flow, i.e. x{t) = (j)*x{0). The set {(j)*x}t>o is called the (forward) trajectory (of x). 

We are only looking at a compact subset X of the whole state space M''. If a trajectory leaves X it 
immediately terminates; in this case we write cjj^x — lu, where w is a fictive state representing everything 
outside X. 

We call the pair (<¥,(/)*) a dynamical system, and we refer to X as state space. Later, we will focus on 
systems for which some states shall be steered into a target region T C A" in a prescribed way. Following 
objects will be at the center of our attention. 

Definition 1. For the target region T <Z X we call the set 

V := {x e X\(t>^x eT for at>0 and (j)''x G X Ws € [0,t]} 

the domain of attraction of T. 
The quantity 

t{x) 

is called the absorption time of x. 

Note that we are only interested in trajectories which stay all the time in X , until being absorbed 

in r. 

2.2 Markov chains 

We use the theory of Markov chains to describe non-deterministic dynamical behavior. From now on in 
this section let 3^ = {1. 2, . . . , n} be a discrete state space. We also work with a given probability space 
(f],^,P). 

Definition 2 (Stochastic process). LetX = N orX = M>o, and let y be the set of possible states. Then 
a family {Z*}tex, where : fl ^ y for all t G I, of y -valued random variables is called a stochastic 
process. 

Definition 3 (Discrete time Markov chain). We call a stochastic process {Z^^teT <^ discrete time Markov 
chain, if T = N, and P G M"^" with (i) P^j > and (ii) — 1; describes the transition 

probabilities, i.e. 

Vji ^ P = i \Z^ =i) -.^V {{Z*+^ = 3 provided Z* = i}) , 

for all t € I and i,j G y. The matrix P is called the transition matrix. A matrix satisfying (i) and (ii) 
is called sub- stochastic. If (ii) holds with equality, P is called stochastic. 

Definition 4 (Continuous time Markov chain). Let I = M>o. Further, let G G M"^" be such that 
(i) Gij > for i ^ j , and 

(ti) Er=i G^, < 0. 



_ J inf{t>0|(/i*a;Gr}, xeV, 
oo, elsewise. 
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Define P* = e**^ — X^fcLo * fcf ■ T'^er* P* zs a sub- stochastic matrix for every t > 0.^ A process {Z*}tgi 

P*,^ = P(Z*+^ j I = i) for all t, s > 0, and i,j ey 

is called a continuous time Markov chain. It is also often called a Markov jump process (MJP). The 
matrix G is called the (infinitesimal) generator of the process. 

One may think of such MJPs as follows (cf. Section 2.6 in [Nor97]). Being in an arbitrary state i at 
an arbitrary time s, the process remains at the given state until some random time s + i, when it jumps, 
independently of t, at random to another state j. The jump time t has exponential distribution with 
parameter G^i, and the probability of jumping to state j (j ^ i) is —Gji/Ga^ unless G^,; = 0, in which 
case the process does not leave the state i ever. 

Consider now a discrete time Markov chain with transition matrix P. If P is sub-stochastic but not 
stochastic, there is an i G 3^ such that X]je:v Pji — Pi < ^- In other words, if the process is currently in 
state i, there is a positive probability 1 — pi that it will not end up in y in the next step — the process 
terminates. Analogous considerations can be done with a continuous time process. Its generator will 
satisfy '}2,j^y ^ji < for some i G y. We call such processes leaky. They reflect the restriction of our 
interest to the state space: everything leaving the state space is considered to be lost. 

In the following we denote a Markov chain already terminated at time thy = w. Here lo represents 
"everything outside of the state space" — just as in the deterministic setting, Section 2.1. 

2.3 Absorption probabilities and expected absorption times 

Throughout this section we consider MJPs. Furthermore, we assume that the set T C 3^ is absorbing (i.e. 
P(Z'* G T I 2'° G T) = 1), and that the process is possibly leaky. Then, under a reachability assumption, 
the process will either end in the absorbing set T, or "leak out" (i.e. terminate in the Active state w; cf. 
above). ^ Let p G [0, 1]" denote the absorption probabilities for the absorbing set, i.e. 

p,^V ({Z' e r for some t > 0, provided Z° = i}) . 

The corresponding absorption time 

A = M{t\Z* e r} 

is a random variable itself; here we are interested in its expectation. The expected absorption times are 
defined as 

a, = Ei{A) ■.= E{A\Z" = i) . 
An important quantity will be the expected termination time. The termination time 

r = inf e TU{uj}} 

is a random variable measuring the time to termination cither in the state uj or in the absorbing set T. 
We define 

t,^E,{T). 

We wish to compute p, a and t from the generator G of the process. 

Set G := Gy\-j- y\-j-; i.e. G is the matrix with the jump rates from y \ T to y \ T. The transiency 

of the set 3^ \ T is equivalent with limj^co e**^ ~ 0. 

Proposition 5 ([Nor97] Theorem 3.3.1). Assume limi_j.oo e**^ = 0. Then the absorption probabilities are 
the unique solution of 

P^ ^ I, ^er. 

*For a proof, see Theorem 2.1.2 in [Nor97]. 

°This reachability assumption is that there is a finite time t > such that P (Z* e T U {lj}) > for any starting state. 
It imphes that all the states y \ T are transient; see [Nor97] Theorem 3.4.2 p. 115. 
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The main difference between the expected absorption times and termination times can be seen without 
computation. It holds = oo, but — 0. A representation of these quantities involving the generator 
has to respect this fact. 

Proposition 6 ([Nor97] Theorem 3.3.3). Assume lim(_5.oo e**^ = 0. Then the expected termination times 
are the unique solution of 

t, = 0, ieTu{uj}. ^ ' 

Along the lines of the proof of Proposition 6 we obtain 
Corollary 7. Assume limt_>.oo e**^ = 0. Then the expected absorption times are the unique solution of 

'^j£y\Tu{u,}°']^j'>- = «<s3^\T, 

a, = 0, ieT, (3) 
ai = oo, i = uj, 

where G^i := — G^^ — X]je:v ^jij termination rate from state i. 

As one can see, the quantities p, a and t can be computed by solving systems of linear equations. 

There is a problem with a which will pose us difficulties when computing absorption times from a 
discretization of the system. Starting at i, if there is a non-zero probability (does not matter how small) 
of terminating in uj at some future time, the expected absorption time will be infinite. Even if these non- 
zero probabilities come from discretization errors, it makes the resulting absorption times not applicable 
for the approximation of absorption times of the original system. 

Note that a^p^ij = i.e. a and t coincide for the states which have absorption probability one; 

clearly because these states terminate in T. 

Alternative measures of absorption times are introduced in Section 4 below. 



3 Discretization 

The desired objects of the deterministic system defined by i = v{x) on X will be computed from the 
discretization given below. 

Partition the state space X into finitely many disjoint partition elements Xi,...,Xn, where each 
set Xi has a piecewise smooth boundary dXi, such that the unit outer normal vector Uj exists almost 
everywhere (measured by the d—1 dimensional Lebesgue measure m^^i on dX^). Usually, the Xi are 
rectangles or simplices. 

Definition 8. The discrete generator matrix G„ G M"^" associated with the vector field v is defined by 

^ ^1 il/m{Xj)) Ssx^nax, ("(^) • dnia-iix), i ^ j 

\ -{l/m{Xi)) fg^^ {v{x) ■ ni{x))^ dmd-i{x), i = j, 

where x ■ y denotes the dot product of the vectors x and y, and /+ denotes the positive part of the function 
/• 

The discretization goes back to [F.JK12], but apart from the probabilistic point of view, it is only 
the spatial discretization of the upwind method known from finite volume methods [LcV02]. A first 
application of this idea for the computation of the DOA appears in [Kol] . It is shown in the latter work 
that G„ indeed generates a M JP on the state space with elements Xi, . . . , Xn, this M JP can be associated 
with a non-deterministic system on X which converges to 0* in distribution as n — > oo. This elucidates 
why we use absorption probabilities and expected absorption times of the MJP generated by G„ in order 
to approximate the DOA and absorption times of (/)*. 

Due to this convergence in distribution if one starts the MJP on a sufficiently fine discretization with 
a distribution highly concentrated around a state x € 2?, the distribution of Z,* is highly concentrated 
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around 



Now, if t is chosen such that (p* E T (it is possible per definition of T) then is already 



absorbed with high probability. This tells us intuitively that for a sufficiently fine partition « 1 if 
Xi C T); and similarly that the expected absorption times of approximate the absorption times of 
well.6 

Remark 9. The consideration here should help to associate the system cj/ with the MJP generated by G„. 
Let Z be a random variable distributed uniformly in X^. Then we have by Lemma 4^.1 and Lemma 4-4 
in [FJK12] 



dt 



Lf now {Zj^}t>Q is the MJP generated by G„, we also have from the definition that 



dt 



izi 



Xj I z„ 



X, 



Note that unless the partition elements Xi and Xj have a fully d—1 dimensional intersection dXi H dXj , 
it holds Gn,ij = 0. Thus, G„ is sparse. 

Another numerical advantage of the discretization is that it does not use trajectory simulation — only 
integrals of at least continuous functions (the (v-Uj)^) on d—1 dimensional domains have to be computed. 



Remark 10 (The "wrapping effect"). Having seen how 
the discretization works we can go back to the effect men- 
tioned at the very end of Section 2. The figure to the 
right shows a trajectory of a stable system spiraling into 
one point, say, the origin. The stability is weak, since 
the decrease of the distance to the origin during one ro- 
tation is small. The rectangles in the figure indicate the 
elements Xi of the chosen partition. Computing the dis- 
crete generator G„ of this .system, one can see, that no 
matter in which partition element one starts in, there is 
always a positive probability to leak out of the state space. 
This effect decreases as the partition gets finer, it does 
not vanish completely however, so the absorption times a„ 
computed from G„ will be infinity for all boxes except the 
target (per definition). To remedy this fact one can work 
with the termination times, or with other constructs; see 
Section 4 below. 



The "wrapping effect": no matter how 
small the partition elements are, there is 
always a nonzero probability for the in- 
duced MJP to leak out. 



4 Alternative measures for the absorption time 

It has been already discussed that the expected absorption times a„ of the discrete generator G„ are not 
the right tool to approximate the absorption times t of the deterministic system. Also the approximation 
through the expected termination times i„ may be highly defective; e.g. when the time to absorption (if 
absorption occurs) and the time to leak-out (if leaking out occurs) are way apart. 

We would like to discuss here two alternative approximations to r by some quantities derived from 
the discrete generator which only depend on absorption times; i.e. on the random variable A. Both are 
solutions to a system of linear equations, thus their computation is not a harder task than that of the 
previously used quantities. Their incorporation into our approach presented below, and the analysis of 
their numerical properties are subject of future work. 

^The reader may have already noticed that we are avoiding precise statements about the convergence of absorption 
probabihties and absorption/termination times. We decided to do so due to the fact that proofs require more advanced 
mathematical tools, which' introduction would highly complicate the presentation and distract the attention from the main 
purpose of this work. Precise statements and their proofs are subject of ongoing work, and will appear elsewhere. 
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Transformation. Let us recall that A denotes the random variable of absorption times. Since 

F{A = oo) > 0, we define 

K = E. (e-^) . 

The advantage is that h attains always finite values, even if the expected absorption time a doesn't. The 
transformation r i— >■ 1 — e""^ is called the Kruzkov transform. Note that 1 — e""^ is continuous on the 
entire state space A" if r is continuous on V. 

One can show that for a MJP with generator matrix G holds 

/i. = 1, zeT. ^' 

If hn is the solution of (5) corresponding to the discrete generator G„, we may use — log(/i„) to approx- 
imate T. 



Absorption times conditioned to absorption. After finishing this work the following idea of P. 
Pollett and D. M. Walker has been pointed out to us. 

If we expect a box to lie in the DOA but the corresponding absorption probability is smaller than 1, 
the expected absorption time conditioned to that absorption occurs in finite time [Wal98] is a very natural 
approximation to r. Hence, define 

a* = E, (A I A < oo) 

on 

y* = {iey\p^>Q], 

where p denotes the absorption probabilities. Then by Theorem 2 [Wal98] we have 

a* = 1, leT. 

Since the magnitude of the pi may range over several orders of magnitude (in practice between machine 
precision to one), any numerical method using (6) has to address stability issues. 



5 Application 

5.1 Parameter dependent system and objective function 

The knowledge gained in the previous sections will be now applied to optimize the stable behavior of 
dynamical systems arising from the vector field v{x; &), where 6 € K'' is an adjustable parameter. 

The quality of the dynamical behavior is going to be measured by an objective function / : M'" ^ M 
which will be subject to minimization or maximization w.r.t. b. This objective function will consist of two 
parts; one representing the desired behavior, and one representing the costs connected with the choice of 
the control parameter. Our two model problems are as follows. 



Maximal DOA. The goal here is to obtain a subset of the state space with the biggest possible 
volume which is steered into the target region T- Correspondingly, one would like to maximize the 
volume m{T>{h)) of the set the DOA associated with v{-:b). We define the objective function as 

f(h):^m{V{h))-a\h\\ (7) 

with a > being some penalty parameter, and |6p :~ ^1- 

Observe that m(T>{b)) — X'D{b)ix) dx, where x-D(b)ix), the characteristic function of the set 'D(b), 
acts as a function of absorption probabilities for the deterministic system: if x is steered into T then 
Xv(b)ix) = 1, and otherwise. Suppose a partition {Xi, . . . , Xn} satisfying the conditions of Section 3 is 
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given. By computing the absorption probabilities Pn from the discrete generator associated with v{-;b) 
for this partition, we obtain an approximation of the objective function; 

n 

f{b) « fn{b) ^^^i)Vn,^ ~ Oi\h\^- 

Our strategy is to use this approximation in order to carry out the maximization. 

Minimal absorption times. Here we have a prescribed set D T, and the goal is to minimize the 
average absorption time over this set. The objective function is defined as 

f{b) / r{x;b)dx + a\b\^ (8) 

JVo 

with some penalty parameter a > 0; where r(-; b) is the function of absorption times associated with b). 
The approximation of this objective function is going to be discussed in Section 5.3 below. 

Remark 11 (Other objective functions). Depending on the needs of the application one can work with 
other objective functions as well. Here we present two of them which can be handled by our method. They 
are not going to be discussed further in the following. 

(a) If it is of higher importance that the DOA covers some specific regions, one could take some function 
9 : X ^ R to weight absorption probabilities. The objective function 

f{b) = [ 9{x)dx~a\b\^ 

Jv{b) 

has a clear representation in terms of absorption probabilities. 

(b) Average absorption speed maximization over a domain Vq of interest: 

f{b)^ [ 

5.2 Iterative optimization 

In this section we describe the optimization procedure for finding (approximate) local extrema of the 
objective function / by performing a local optimization of its approximation /„. For the demonstration 
we choose the problem of DOA maximization. 

Suppose we have chosen a partition {Xi, . . . ,Xn} of the state space X such that the target region T 
is the union of some partition elements. Later on we will use the short-hand notation i G T for Xi C T. 
Set /„(5) = X]"=i m{^i)Pn,i - a\b\'^. 

Next we show that under fairly general conditions the objective function /„ is differentiable w.r.t. b. 
Then, by the gradient method 

bk+i=bk + Dfn{bk), fc = 0,l,... 

we can compute a local maximum of it. Here and in the following Dg denotes the derivative of the 
function g. The value fn{b) is computed by the sequence of mappings 

b v{-; b) ^ G„ ^ Pa ^ fn{b). 

The differentiability of all these mappings would imply the differentiability of /„ , but milder conditions 
suffice as well. Let G„(w) denote the discrete generator associated with v. Since the number of partition 
elements, n, does not change throughout the computation, for simplicity we drop the subscript n. 

• b^ V 

A weaker assumption than the differentiability of v{-] b) w.r.t. b suffices here, see Remark 13 below. 

• W H> G 
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Lemma 12. Let v be a continuous and 6v a bounded vector field from X to . Further assume that 
md-i{dXi f] {v ■ Ui — 0}) = for all i = I, . . . ,n, where Ui denotes the outer normal on dXi. Then the 
directional derivative of G{v) in the direction Sv, 

J.^+Sv{x)- Uj (x) dma^ i{x), i^j 
Proof. We have for i j 




where o(e) denotes a function g{e) such that g{s)/e — > as £ — >■ 0. The second equation follows from 
the uniform continuity of v on compact sets and from the boundedness of 5v, the third one from the 
condition md-i{dXi {v ■ Ui — 0}) = 0. The proof for i = j is the same. □ 

Remark 13. The previous lemma still holds if 6v is only defined on X \ J\f, and the set J\f satisfies 
md-i{dXi nAf) = for every i; i.e. there is no fully d — 1 dimensional intersection between the boundaries 
of the partition elements and the set of points where Sv is not defined. The application we have in mind 
is the case where v is continuous but only piecewise differentiable w.r.t. b. Such a case may arise if due 
to some technical limitation the control has a maximal amplitude; we will show this kind of an example 
later on. 

• G i-^ p ^ 

Next we address the differentiability of p w.r.t. G. As in Section 2.3, let G denote the part of G with the 
jump rates between the Xi which are not contained in T. Let qi — J2jeT ^i^- denotes the absorption 
probabilities corresponding to partition elements contained in A" \ T, Proposition 5 shows that 

p - -G-^q. 

Hence p is arbitrarily smooth in the entries of G and we get by elementary calculus 
Lemma 14. The directional derivative of p(G) in the direction 5G is given by 

DpiG) ■ 6G = G-^SG^G-^q - Q-^Sq; 

and Dp,{G)^Q fori eT. 

• p^ f{b) 

Finally, the first sumniand of f(b) is linear in p, thus differentiable; and the second summand is clearly 
differentiable w.r.t. b. 



is given by 

DG{v)ij ■ Sv = 
where X+ = dX, n dX^ n {v ■ > 0}. 
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Summary: optimization procedure. Under the assumptions made in this section the objective 
function / is difFerentiable, and we sum up the optimization procedure by the gradient method as an 
algorithm. 
Initialization: 

Let some partition {Xi, . . . ,Xn} of X be given. Choose some 60 € W such that T C I?(6o)j a tolerance 
threshold TOL > 0, and step sizes 70, 71, . . .. Set 

n 

/(fc) = ^m(A',)p,-a|6|2. 

For k — 0,1, . . . perform the following steps: 

1. Compute Df{bk)- 

2. STOP if \Df{bk)\ < TOL. 

3. Set bk+i = bk+ lkDf{bk). 

Remark 15. There are methods exploiting smoothness with higher performance than the gradient method 
(e.g. the Gaufi-Newton method), however it is not clear at the first sight if the mapping u 1— > G is 
differentiable more than once. Also, there are even more sophisticated first order methods than the simple 
gradient method [Nes83]. Using these methods for the optimization of our problem will be the subject of 
future work. 

5.3 Minimal absorption times 

We have already discussed at the end of Section 2.3 that the expected absorption times, a„, computed 
from G„ may be inadequate for approximating r. We also noted that a„^i = tn,i if Pn,i = 1, hence we 
expect tn to be a good approximation of t on T). Under the same assumptions as in Section 5.2, and by 
using the short-hand notation i e 2?o for Xi C "Dq, we aim to minimize the objective function 

fn{b) HXi)tn,^ + a\b\'' 

ieVo 

for some prescribed region I?o- Just as above, we drop the subscript n for simplicity. 

The strategy is analogous to the one in the previous section: we establish the differentiability of / 
w.r.t. b, compute the derivative Df[b), and apply the simple gradient descent method 

bk+i^bk~"fkDf{bk), fc = 0, 1,... 

to compute a local minimum of the objective function. 

The differentiability of / is proven along the same lines as in the previous section, except for Dt{G). 
Denote by i the vector with entries ti whith i E X \ T- 

Lemma 16. The directional derivative oft(G) in the direction 5G is given by 

Di{G) ■ 6G = G-^6G'^G-^e, 

where e — {1, . . . , 1)"^. 

Further holds Dti{G) ^OforieT. 

Proof. We have from (2) that i ~ — G^-^e. Differentiation w.r.t. G yields the first claim. The second 
follows from t,;(G) = for i G r. □ 
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Note that unless there is a 6 such that Vq C I?(6), we cannot expect the objective function to obtain 
finite values, since t(x; b) = oo for x ^ T^{b)- To exclude such a case we assume that we already start the 
optimization with a such that I?o Q 2-'(6o).^ Further, we have to assure that none of the ht is such 
that I?o ^ T^{bk)- Observe that Vq C 'D{bi^) is equivalent with 

if I?o is the union of partition elements; what we assume from now on. Thus, we have to assure that the 
sequence {5(6fe)}fc=o,i,... is constant, in particular non-decreasing. If the increment A6fc := Df{bk) in the 
iteration of the gradient method is small enough, we have 

gih+i) ~ g{bk) ~ Dg(bk) ■ Abk, 

and the condition Dg{bk) ■ A6fc > assures that the above sequence is essentially non-decreasing. Note 
that the computation of Dg can be done similarly to that of Df in Section 5.2. So if Dg{bk) ■ Abk < 0, 
we use the projection of Abk onto Dg{bk)^ := {x &W \ x ■ Dg{bk) = 0} as increment. 

Summary: optimization procedure. Under the assumptions made in this section the objective 
function / is diffcrcntiablc, and we sum up the optimization procedure by the gradient descent method 
as an algorithm. 

Initialization: 

Let some partition {Xi, . . . , X^} of X, and some Vq D T he given. Choose some b^ G W such that 
2^0 £ 2?(6o), and a tolerance threshold TOL > 0. Set 

f{b) - HX^)t^+a\b\\ 

For fc = 0, 1, . . . perform the following steps: 

1. Compute Abk Df{bk) and Dg{bk)- 

2. STOP if \Abk\ < TOL. 

3. IF Dg(bk) ■ Abk < 0, set 

a;, ^^fc ■ ^9{bk) ^ ^ 

Abk Abk in (u \\2 ^dih)- 

\Dg{bkW 

4. Set bk+i = bk - Abk. 

Remark 17. There are more established methods for constrained optimization than the one we use here. 
Their application, however, is beyond the scope of this work. 

5.4 Parameter-afRne systems 

Here we consider the model problems for the case where «(•; b) is affine-linear in b; i.e. 

v{x;b) = vo{x) +Vc{x;b), 
with Vc{-] b) being linear in b. Take for example 

v{x; b) = vo{x) + Vc{x)'Bx, 

where vq and the columns of Vc are vector fields from X to M'^, and B is obtained from b by reshaping 
the vector to a matrix. 

^We may find such a bg by taking the maximization problem from Section 5.2 with the objective function f{b) = 
fvo X'D(b) ~ Cfl^P foi' some very small a > 0, with its discrete counterpart f(b) = J^igCo rn{Xi)pi{b) — a\b\'^. 
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For this class of vector fields an additional approximation step may save a huge amount of compu- 
tational efforts. Let G W, i = l,...,r, be such that Ci^i — 1 and Cij = for j ^ i. Then, by 
linearity, 

r 

v(x\ b) = Vu{x) + ^ biVc{x; e^), 

and we use 

r 

G*M ■■= GnM + J2 \b^\Gn (sign(6, )z^c (• ; 6.)) (9) 

i=l 

instead of G„(w) in our computations.^ The reason why G* (w) is not simply a linear combination is the 
fact that linear combination of generator matrices does not have to be a generator matrix. Conical com- 
bination of generator matrices, however, is a generator matrix. Unfortunately, differentiability of G* (w) 
w.r.t. bi at bi — is not guaranteed any more. 

The usage of G* (u) has the advantage that the computationally expensive steps of computing the 
generator matrix for a vector field have to be done only once, at the beginning (2r -I- 1) times, and not 
for all iterates bk in the gradient method. This brings a massive speed-up in runtime at the expense of 
small loss in accuracy. 



6 Numerical examples 

6.1 Example: Domain of attraction maximization 

Consider the two dimensional dynamical system given by 

" ^ ' V ' 

with saturation condition |jwc(a;; 6)|joo < 0.3. Here b is the parameter vector containing the entries of the 
parameter matrix B G M^^^. We chose the state space to he X = [—1, 1] x [—1, 1], and the target region 
tober= [-0.05,0.05] x [-0.05,0.05]. 

The goal is to maximize — a||&||2i with a = 0.02. For this we use the method described in 
Section 5.2 (see Remark 13 on how to handle the above saturation condition) with step sizes 7^ = 3. 
Note that the linearization of system (10) around the origin yields ^ = — B^. We start the iteration with 

Bo ^ 1^ ' '^^ ^'^ gradient steps. 

In Figure 1 we show the absorption probabilities corresponding to the system with parameter matrix 
B15 (for the one computed with the gradient method for the particular partition, respectively), from left 
to right for a 64 x 64, a 128 x 128 and a 256 x 256 uniform partition of the state space. 

Table 1 shows the change in the objective function after 15 gradient steps for the different partitions. 
For all three discretizations ||-D/(6i5)||2 < 10~^. The objective function increased by 15% compared with 
the naive initial parameter array computed from linearization. 

There is no reason to expect the objective function to have only one local minimum (which will be the 
global one as well). Thus we applied the algorithm to different initial parameter values. The resulting 
sequence, however, converged always to the same optimum. 

To save computational efforts one could start an optimization with a coarse resolution, and successively 
change to a finer partition at some appropriate iteration step. 

*Note that in general G„{v) is not linear in v, but it approximates the dynamics in a distributional sense, cf. above, 
hence we expect it to behave linearly "in the limit" , i.e. if n is large enough. Any further discussion on the topic would 
lead beyond the scope of this paper, involving semigroups of transfer operators associated with flows; we refer the reader 
to [FJK12]. 
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Figure 1: The absorption probabilities corresponding to the parameter 615 for a 64 x 64 (left), a 128 x 128 
(center) and a 256 x 256 (right) uniform partition of the state space. The contour indicates the border 
of the DOA computed by direct simulation. 





64 X 64 


128 X 128 


256 X 256 


fibo) 


0.5888 


0.5970 


0.6026 


/(615) 


0.6768 


0.6862 


0.6924 



Table 1: Initial and final values of the objective function for the three different partitions. 
6.2 Example: Absorption time minimization 

Consider system (10) with the same saturation condition as above. Let X = [^I^IIj 
r = [-0.03,0.03] X [-0.03,0,03] and Vq^ {x£R^\ \x\ < 0.3}. 
The goal is to minimize 

f{b)= [ T{x)dx + a\b\\ 

with a = 0.02. For this we use the method described in Section 5.3 with step sizes 7^ = 3, and 15 steps. 
We start the iteration at the optimal parameter value computed for the maximal DOA above, which 
. „ /0.89 0.35\ 
''^°=[0.75 1.4 j- 

In Figure 2 we show the optimal absorption times computed with the gradient descent method for 
three different partitions. In every case the optimum seems to occur at the boundary of the feasible 
region {6 | I?o C I?(6)}. This matches well with the fact that the boundary of Vq (indicated by the 
circular contour in the figure) is very close to the boundary of the set with high absorption probabilities 
(the plotted boxes indicate absorption probabilities greater than 0.9). 

After 15 steps the gradients (more precisely, the gradients projected to Dg{bk)'^, since the iteration 
converges to the boundary of the feasible set; see Section 5.3) were for all discretizations smaller than 
4 ■ 10"'^. Table 2 shows the change in the objective function after 15 gradient steps for the different 
partitions. It is worth to note that for different starting points Bq the iteration may run into different 





64 X 64 


128 X 128 


256 X 256 


fibo) 


1.620 


1.142 


0.9418 


/(615) 


0.5278 


0.4750 


0.4436 



Table 2: Initial and final values of the objective function for the three different partitions. 

(but very similar) local minima which all lie at the boundary of the feasible region |6 | C I?(6)}. 
Also, for finer partitions these local minima are closer to each other. This suggests that the occurrence 
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0,5 1 



0,5 1 



0,5 1 



Figure 2: The absorption times corresponding to the parameter 615 of the discrete generator computed 
on a 64 X 64 (left), a 128 x 128 (center) and a 256 x 256 (right) uniform partition of the state space. 
Only those boxes have been plotted which have an absorption probability more than 0.9. The contour 
indicates the region Vq. 



of multiple minima is only due to the finite discretization of the problem, and they all collapse to one 
minimum as the diameter of the partition elements tend to 0. 

6.3 Example: AfRne parameter dependence 

We test the modification introduced in Section 5.4 on the system 



Xi 



3X9 



+ 2X2 

2xi + 3xi + X2 



50x1 



-1 






-0.1 



B 



(11) 



which has the desired affine dependence on the parameter array b. 

First we compute the discrete generators corresponding to the system (11) with parameter 

Bo = by the direct discretization (4) and by the modified discretization (9), respectively. 

The state space is [—1, 1] x [—1, 1], and we apply a 40 x 40 uniform partition in both cases. Next, we 
compute the absorption probabilities for the target T— [—0.05,0.05] x [—0.05,0.05] using the one and 
then the other discretization. Figure 3 shows the results. To understand why does the modified dis- 




0,5 1 



Figure 3: Absorption probabilities computed using the standard discretization (left) and the modified 
discretization (9) (right). The larger diffusivity of the modified discretization is reflected in the milder 
descent of absorption probabilities. 



cretization give more "blurred" absorption probabilities, note the following. The standard discretization 
takes a linear combination of some vector fields and computes the discrete generator from it. The modi- 
fied method takes discrete generators of some vector fields and combines them linearly to one generator. 
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This means for one element in the partition, that if the component vector fields v{-,ei) show locally in 
many different directions, the combined generator of the modified discretization will yield transition rates 
in many different neighboring partition elements. Hence, the generator from the modified discretization 
may have a larger "diffusivity" , forcing absorption probabilities of neighboring boxes to be closer to each 
other. This is what we observe as blurring in the figure. 

This diffusivity effect decreases as we use finer partitions. Applying 15 steps of the gradient method for 
the problem of maximizing — 0.02|6p with a 256 x 256 partition and Bq as above, the difference in 
the objective fimction computed by the two discretizations is only 3.5%, and the corresponding absorption 
probabilities are shown in Figure 4. Note that the modified method computes 9 generators at the 
beginning, and then no more for the whole iteration. Since in this example approximately 45 steps are 
needed that the gradient of the objective function falls under 10^"^ (the step sizes are constant, 7^ = 3), 
the modified method has a notable advantage over the standard one. 




'\ -0.5 0.5 1 ° '\ -0.5 0.5 1 



Figure 4: Absorption probabilities for system (11) with 615 computed from the standard discrete gen- 
erator G{v) (left) and from the modified discrete generator G*(?;) (right) on a 256 x 256 partition of 
[— 1, 1] X [—1, 1]. The corresponding objective functions differ only up to 3.5%. 

Remark 18. If the state space is partitioned into N elements (and if there are 0{N) of them not belonging 
to the target), then the computation of the discrete generator has 0{N) computational complexity, while 
the computation of absorption probabilities and times has 0{N'^) (due to the solution of a system of 
linear equations by LU- decomposition). For the examples presented here the computation of the discrete 
generator was always by far the computationally most expensive step, however for finer partitions, and 
especially in higher system dimensions, the computational costs of the solution of the linear equations 
may dominate. These computations can be carried out more quickly by hierarchical GMRES techniques 
(see [KollO], Section 5.7.4), so that the modified discretization for the case of linear dependence on the 
parameters still induces a considerable speed-up in the optimization against the standard discretization (4). 

7 Conclusions 

We have proposed a method for the optimization of the global stability properties of time-continuous 
autonomous parameter-dependent systems. It uses an approximation of the deterministic dynamics by a 
Markov jump process (essentially the spatial discretization of the upwind method). The main computa- 
tional properties of the method are that no trajectory simulation is needed, and that the computation of 
the objective function and of its derivative is simple and consist mostly of explicit steps. If the system is 
affine-linear in the parameters, a considerable speed-up is achieved by a slight modification. 

A quantitative performance analysis of the method, in particular convergence statements, are subject 
of future work, just as the incorporation of adaptivity. The advantages of combining the basic idea with 
some more sophisticated optimization approaches are also to be investigated. 
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